Automatic Recognition of Tibetan Buddhist Text by Computer
نویسندگان
چکیده
The purpose of this study is to develop a plausible method to code and compile Buddhist texts automatically from original Tibetan scripts into the Romanized form. We extract syllable from Tibetan texts and recognize automatically the Tibetan characters. The set of Tibetan characters consists of basic 30 consonants, 76 combination characters, and 4 vowels. Despite of the limited number of Tibetan characters, there are many similar characters in shape. Therefore, to separately recognize them we apply an Object Oriented Dictionary ( OOD ) which is created combining the categorization and character identification procedures. From our experiment, it is confirmed that it is possible to improve the rate of Tibetan character recognition dramatically by Object Oriented Method [ Ref. 1,3 ]. We would like to express our opinion on automatic character recognition for wooden blocked Tibetan manuscripts.
منابع مشابه
A character recognition scheme based on object oriented design for Tibetan buddhist texts
The purpose of this study is to develop a plausible method to code and compile Buddhist texts from original Tibetan scripts into Romanized form. Using GUI (Graphical User Interface) based on Object Oriented Design, a dictionary of Tibetan characters can be easily made for Buddhist literature researchers. It is hoped that a computer system capable of highly accurate character recognition will be...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملThe multiple pronunciations in Taiwanese and the automatic transcription of Buddhist sutra with augmented read speech
Collection of Taiwanese text corpus with phonetic transcription suffers from the problems of multiple pronunciation, or pronunciation variation. By further augmenting the text with read speech, and using automatic speech recognition with a sausage searching net constructed from the multiple pronunciations of the text corresponding to its speech utterance, we are able to reduce the effort for ph...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملResearch on Tibetan Automatic Word Segmentation
This paper researches on Tibetan automatic word segmentation. We focus on three key technologies of Tibetan automatic word segmentation: (1) a Tibetan automatic word segmentation approach is proposed, which is taking the advantage of case-auxiliary words and continuous feature. (2) a resolution method of overlapping ambiguity in Tibetan word segmentation is proposed, which is based on forward-b...
متن کامل